Module 02 - Microservices with Python
The Monolith That Broke on a Tuesday
It starts innocently. Your document processing application handles uploads, runs OCR, classifies documents, sends notifications, and stores results - all in one Flask app. Deploys take 40 minutes. A bug in the notification code brings down OCR. A memory leak in the classification model crashes the upload handler. You cannot scale the CPU-heavy processing tier independently of the lightweight notification sender.
Then Tuesday comes. A spike in document uploads saturates the OCR workers. The entire application becomes unresponsive. Users cannot even check their upload history - that endpoint lives in the same overloaded process.
This module is about what you build instead, and more importantly, how and when to build it.
What You Will Learn
| Lesson | Topic | Key Skills Gained |
|---|---|---|
| 01 | FastAPI in Depth | DI patterns, lifespan events, middleware, exception handlers, OpenAPI customisation |
| 02 | gRPC with Python | Protocol Buffers, all four streaming patterns, interceptors, error mapping |
| 03 | Event-Driven Architecture | Kafka, Redis Streams, Event Sourcing, CQRS, Saga pattern |
| 04 | Service Mesh Patterns | Circuit breakers, retry with jitter, bulkheads, OpenTelemetry tracing |
| 05 | API Versioning and Contracts | Pact contract testing, schema evolution, SDK generation, deprecation |
Prerequisites: Python async fundamentals (Module 1 of this series), Docker basics, HTTP fundamentals.
Time commitment: ~12 hours of focused study, ~8 hours of hands-on project work.
The Migration: From Monolith to Four Services
The best way to understand microservice boundaries is to watch a real extraction. Here is DocumentProcessingMonolith - a class doing eight things that should never be owned by a single deployable unit.
# BEFORE: The Monolith - one class, eight responsibilities
# Every deployment touches every capability.
# Every bug can affect every user.
# You cannot scale OCR independently of email sending.
class DocumentProcessingMonolith:
def __init__(self, db_conn, email_client, storage_client, ocr_engine, classifier):
self.db = db_conn
self.email = email_client
self.storage = storage_client
self.ocr = ocr_engine
self.classifier = classifier
def process_document(self, file_bytes: bytes, user_id: str, filename: str) -> dict:
# Responsibility 1: Validate input
if len(file_bytes) > 50 * 1024 * 1024:
raise ValueError("File too large")
if not filename.endswith((".pdf", ".png", ".jpg")):
raise ValueError("Unsupported format")
# Responsibility 2: Store raw file
storage_key = f"raw/{user_id}/{filename}"
self.storage.put(storage_key, file_bytes)
# Responsibility 3: Run OCR (CPU-heavy, 2–30 seconds)
text = self.ocr.extract_text(file_bytes)
# Responsibility 4: Extract metadata
metadata = self._extract_metadata(file_bytes)
# Responsibility 5: Generate thumbnail (also CPU-heavy)
thumbnail = self._generate_thumbnail(file_bytes)
self.storage.put(f"thumbs/{user_id}/{filename}.jpg", thumbnail)
# Responsibility 6: Classify document (ML inference)
label = self.classifier.classify(text)
# Responsibility 7: Write audit log
self.db.execute(
"INSERT INTO audit_log(user_id, filename, label, ts) VALUES (%s, %s, %s, %s)",
(user_id, filename, label, datetime.utcnow()),
)
# Responsibility 8: Send notification email
email = self.db.fetchone(
"SELECT email FROM users WHERE id = %s", (user_id,)
)["email"]
self.email.send(
to=email,
subject="Your document is ready",
body=f"Document '{filename}' classified as: {label}",
)
return {"storage_key": storage_key, "label": label, "text": text[:500]}
The problems are architectural, not stylistic. You cannot:
- Scale OCR (4 CPUs minimum) without also scaling email sending (almost zero CPU).
- Deploy a new ML classification model without touching the upload handler.
- Upgrade notification templates without running the full OCR regression suite.
- Have the ML team own the classifier independently of the infra team owning storage.
- Isolate a memory leak in the classifier from affecting uploads.
AFTER: Four Services with Clear Boundaries
┌──────────────────────────────────────────────────────────────────────┐
│ API Gateway / Nginx │
│ (TLS termination, auth, rate limiting, routing) │
└──────────────────┬────────────────────────────────────┬──────────────┘
│ REST/JSON │ REST/JSON
▼ ▼
┌──────────────────────────┐ ┌────────────────────────────┐
│ Upload Service │ │ Classification Service │
│ FastAPI · Python │ │ FastAPI + gRPC · Python │
│ │ │ │
│ • File validation │ │ • ML model serving │
│ • Virus scanning │ │ • Text → label mapping │
│ • S3/GCS storage │ │ • Confidence scores │
│ • Publishes: │ │ • Model version tracking │
│ doc.uploaded event │ │ • gRPC for internal calls │
│ │ │ │
│ CPU: low RAM: 256 MB │ │ CPU: high RAM: 2 GB │
│ Replicas: 2 │ │ Replicas: 4 │
└──────────────┬───────────┘ └────────────────┬───────────┘
│ │
│ ┌────────────────────────┐ │
│ │ Message Broker │ │
└────►│ Kafka │◄─────┘
│ │
│ Topics: │
│ • doc.uploaded │
│ • doc.processed │
│ • doc.classified │
└────────────┬────────────┘
│
┌──────────────────────┴─────────────────────┐
▼ ▼
┌────────────────────────────┐ ┌─────────────────────────────┐
│ Processing Service │ │ Notification Service │
│ FastAPI · Python │ │ FastAPI · Python │
│ │ │ │
│ • OCR (pytesseract) │ │ • Email (SendGrid) │
│ • PDF text extraction │ │ • SMS (Twilio) │
│ • Thumbnail generation │ │ • In-app push notifications │
│ • Metadata extraction │ │ • User preference lookup │
│ • Publishes: │ │ • Template rendering │
│ doc.processed event │ │ │
│ │ │ CPU: low RAM: 128 MB │
│ CPU: very high RAM: 1 GB │ │ Replicas: 1 │
│ Replicas: 8 │ │ │
└────────────────────────────┘ └─────────────────────────────┘
Each service:
- Owns its own data store - no shared database
- Deploys independently on its own CI/CD pipeline
- Scales based on its own resource bottleneck
- Is owned by one team with full autonomy
- Can be rewritten or replaced without affecting other services
This is the architecture you will build, piece by piece, across this module.
When to Use Microservices vs Monolith
This is the most consequential decision in this module. Most teams reach for microservices too early, creating distributed monoliths - all the operational complexity of distributed systems, with none of the independent deployment or scaling benefits.
The Decision Matrix
| Factor | Choose Monolith | Choose Microservices |
|---|---|---|
| Team size | Under 8 engineers | Multiple teams, 15+ engineers |
| Deployment frequency | Infrequent, coordinated | Teams deploy independently, 20+ times/day |
| Scale requirements | Roughly uniform | Wildly different per component |
| Domain clarity | Still discovering the model | Well-understood bounded contexts |
| Operational maturity | No Kubernetes expertise | Strong DevOps culture, observability in place |
| Data ownership | Shared DB acceptable | Clear ownership, teams own their schema |
| Development speed | Ship an MVP fast | Independent team velocity matters |
The rule: If your team cannot draw bounded contexts on a whiteboard without a 30-minute argument, you are not ready for microservices.
The Recommended Path: Modular Monolith First
Start with a modular monolith - well-separated modules inside one deployable, with strict interface boundaries. Extract to services when you have evidence of the need.
# A modular monolith: one deployment, clean internal interfaces
# When you later extract to a service, you only change the adapter
# upload/ports.py - defines what upload module needs from outside
from abc import ABC, abstractmethod
class StoragePort(ABC):
@abstractmethod
async def put(self, key: str, data: bytes) -> str: ...
class EventPort(ABC):
@abstractmethod
async def publish(self, topic: str, event: dict) -> None: ...
# upload/service.py - business logic, no infrastructure knowledge
class UploadService:
def __init__(self, storage: StoragePort, events: EventPort):
self._storage = storage
self._events = events
async def upload(self, file_bytes: bytes, filename: str, user_id: str) -> str:
key = f"raw/{user_id}/{filename}"
await self._storage.put(key, file_bytes)
await self._events.publish("doc.uploaded", {
"key": key, "user_id": user_id, "filename": filename
})
return key
# In a monolith: events are in-process function calls
# When you extract: events become Kafka messages
# The UploadService code does not change - only the EventPort adapter changes
The EventPort abstraction is the key. Today it calls a function in-process. Tomorrow it sends a Kafka message. The service logic is identical.
CAP Theorem: The Constraint Every Microservice Engineer Must Internalize
In a distributed system, you can guarantee at most two of these three properties simultaneously:
| Property | Meaning | Real-World Definition |
|---|---|---|
| Consistency | Every read reflects the latest write | All nodes return the same data at the same moment |
| Availability | Every request receives a response | System responds even when some nodes fail |
| Partition Tolerance | System operates despite network failures | Continues working when the network splits nodes |
Network partitions are inevitable in distributed systems. You will have network failures. So partition tolerance is not optional - the practical choice is always between C and A when a partition occurs:
- CP systems (Consistency + Partition Tolerance): Refuse to answer during a partition rather than return stale data. Examples: HBase, Zookeeper, etcd, PostgreSQL in strict mode. Use for: distributed locks, financial transactions, leader election.
- AP systems (Availability + Partition Tolerance): Return potentially stale data rather than refuse to respond. Examples: Cassandra, DynamoDB, CouchDB. Use for: shopping carts, user sessions, activity feeds, search indexes.
In the document intelligence platform:
| Data | Choice | Reason |
|---|---|---|
| Classification label in read model | AP | Slight staleness is fine; user sees updated label within seconds |
| Billing record for processing charge | CP | Must be accurate; better to show an error than charge incorrectly |
| Audit log entries | AP with eventual consistency | Events arrive in order eventually; availability > immediate consistency |
| User session token | AP | Returning a slightly stale token is better than login failing |
The 8 Fallacies of Distributed Computing
Peter Deutsch and James Gosling catalogued eight assumptions developers incorrectly make. Each one will cause production incidents if ignored.
| # | Fallacy | The Truth | How to Defend Against It |
|---|---|---|---|
| 1 | The network is reliable | Packets are dropped, connections reset, routers fail | Retries with exponential backoff; idempotent operations |
| 2 | Latency is zero | Cross-datacenter calls: 5–100 ms; cross-pod: 0.5–2 ms | Async messaging; connection pooling; caching; batching |
| 3 | Bandwidth is infinite | Large payloads are expensive and slow | Pagination; compression (gzip/brotli); streaming; binary protocols |
| 4 | The network is secure | Traffic can be intercepted, replayed, spoofed | mTLS between services; service-to-service JWT auth; encryption at rest |
| 5 | Topology doesn't change | IPs change when pods restart; services autoscale in and out | DNS-based service discovery; health checks; graceful connection draining |
| 6 | There is one administrator | Multiple teams deploy conflicting changes simultaneously | API contracts; consumer-driven contract testing; feature flags |
| 7 | Transport cost is zero | Serialisation, TLS handshakes, HTTP overhead all accumulate | gRPC (binary, multiplexed) for high-frequency internal calls |
| 8 | The network is homogeneous | Different languages, OS, protocol versions, MTU sizes | Standard protocols (HTTP/2, gRPC); schema registries; protocol buffers |
By the end of this module, you will have written Python code that defends against all eight.
How Python Fits Into Polyglot Microservice Architectures
Python is rarely the only language in a mature microservice shop. Understanding how Python services interoperate with Go, Java, and Rust services is essential.
Polyglot Production Architecture
─────────────────────────────────
┌──────────────────┐ gRPC (proto) ┌──────────────────────────┐
│ API Gateway │─────────────────►│ Auth Service (Go) │
│ Python / FastAPI│ │ ~1 ms latency, 64 MB RAM│
└────────┬─────────┘ └──────────────────────────┘
│
│ REST / JSON
▼
┌──────────────────┐ Kafka events ┌──────────────────────────┐
│ Document Service │─────────────────►│ ML Pipeline (Python) │
│ Python / FastAPI │ │ PyTorch inference server │
└──────────────────┘ └──────────────────────────┘
│
│ REST / JSON
▼
┌──────────────────┐ gRPC (proto) ┌──────────────────────────┐
│ Search Service │─────────────────►│ Index Builder (Java) │
│ Python / FastAPI │ │ Lucene, needs heavy JVM │
└──────────────────┘ └──────────────────────────┘
Python wins in microservice architectures for:
- ML and data pipelines (PyTorch, scikit-learn, pandas have no peer)
- API gateway services (FastAPI with uvicorn rivals Go throughput for I/O-bound work)
- Scripting and orchestration (calling and coordinating other services)
- Rapid prototyping (fastest path from idea to deployed service)
Python loses to Go/Rust when:
- CPU-intensive hot paths need true parallelism (the GIL is a real constraint)
- Memory is severely constrained (Python interpreter overhead is ~30–50 MB baseline)
- Connection counts exceed ~10,000 concurrent (Go goroutines outperform asyncio at this scale)
The practical pattern: Python for business logic and ML inference; Go or Rust for the network-critical, CPU-intensive hot paths; all connected via gRPC and Kafka.
Module Project: The Document Intelligence Platform
Each lesson constructs one piece of a complete, deployable Document Intelligence Platform.
| Lesson | Service Built | Technology Highlighted |
|---|---|---|
| 01 - FastAPI in Depth | Upload Service | DI, lifespan, middleware, background tasks |
| 02 - gRPC with Python | Classification Service | Proto definition, streaming RPC, interceptors |
| 03 - Event-Driven Architecture | Event backbone | Kafka topics, event sourcing, saga pattern |
| 04 - Service Mesh Patterns | Resilience layer | Circuit breakers, tracing, health checks |
| 05 - API Versioning and Contracts | Versioned API | Pact tests, schema evolution, client SDK |
The code in each lesson is production-quality. It handles errors, logs correctly, and is structured for testability. You can deploy it.
Environment Setup
# Project structure
mkdir -p doc-intelligence/{upload-service,classification-service,processing-service,notification-service,shared,protos}
cd doc-intelligence
# Python environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Core service dependencies
pip install fastapi "uvicorn[standard]" httpx pydantic "pydantic-settings"
pip install grpcio grpcio-tools protobuf
pip install kafka-python confluent-kafka redis
pip install opentelemetry-api opentelemetry-sdk
pip install opentelemetry-instrumentation-fastapi
pip install opentelemetry-exporter-otlp
pip install tenacity pact-python
# Infrastructure via Docker Compose
cat > docker-compose.yml << 'YAML'
version: "3.9"
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.6.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.6.0
depends_on: [zookeeper]
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
redis:
image: redis:7-alpine
ports:
- "6379:6379"
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: documents
POSTGRES_USER: admin
POSTGRES_PASSWORD: secret
ports:
- "5432:5432"
jaeger:
image: jaegertracing/all-in-one:1.55
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
YAML
docker compose up -d
echo "Infrastructure ready."
Navigating This Module
Each lesson is self-contained - you can study gRPC without having read the FastAPI lesson. But the project connects them all. Recommended approach:
- Read the lesson once, end to end, skimming code to understand structure
- Reproduce every code example from scratch (not copy-paste) - this is where understanding forms
- Build the mini-project at the end of each lesson
- Integrate your service with the one built in the previous lesson
The fastest path to mastering distributed systems is building a small one and watching it fail in interesting ways.
Let's begin.
